This document provides a very brief overview to basic progamming in R adapted from training provided by Prof. Ross Ihaka.
Further information can be found at https://www.stat.auckland.ac.nz/~ihaka/?Teaching
Topics include:
One way to think of R is as a very powerful calculator. If we type a calculation into the console an answer is printed out.
# Performing calculations using mathematical notation
2 + 2
## [1] 4
# Precedence can be set using parentheses
(10 + 5) / 2
## [1] 7.5
Once R has completed the computation the answer is discarded. To ‘save’ a result you can assign a name using <- or = to create a variable. Variables can then be used in subsequent expressions.
# Create a variable by assigning a name 'a'
a <- 5^2
Notice no result is printed
# Print the result named 'a'
a
## [1] 25
# Calculate the square root of 'a'
sqrt(a)
## [1] 5
R operates on named data structures - the most basic of which is a vector. Vectors contain an indexed collection of values of the same type.
Vectors are usually created with the function c()
x <- c(1, 3, 5, 7, 9)
y <- c(TRUE, FALSE, TRUE)
Check the type of a variable
class(x)
## [1] "numeric"
class(y)
## [1] "logical"
Determine the number of elements in a vector
length(x)
## [1] 5
length(y)
## [1] 3
# Be careful of mixing types
c(1, 5.5, 7)
## [1] 1.0 5.5 7.0
c(1, TRUE, "Hello")
## [1] "1" "TRUE" "Hello"
Creating vectors by a sequence
5:10
## [1] 5 6 7 8 9 10
seq(1, 10, by = 2)
## [1] 1 3 5 7 9
-12.5:12
## [1] -12.5 -11.5 -10.5 -9.5 -8.5 -7.5 -6.5 -5.5 -4.5 -3.5 -2.5
## [12] -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
## [23] 9.5 10.5 11.5
x
## [1] 1 3 5 7 9
x + 10
## [1] 11 13 15 17 19
sqrt(x)
## [1] 1.000000 1.732051 2.236068 2.645751 3.000000
1:10/2
## [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Values can be subset out a vector using square brackets [ ]
x
## [1] 1 3 5 7 9
x[3]
## [1] 5
x[2:4]
## [1] 3 5 7
x[-3]
## [1] 1 3 7 9
x > 5
## [1] FALSE FALSE FALSE TRUE TRUE
x[x > 5]
## [1] 7 9
names(x) <- c("Jan", "Mar", "May", "Jul", "Sep")
x
## Jan Mar May Jul Sep
## 1 3 5 7 9
x["May"]
## May
## 5
c(mean = mean(x), sd = sd(x))
## mean sd
## 5.000000 3.162278
quantile(x)
## 0% 25% 50% 75% 100%
## 1 3 5 7 9
cumsum(x)
## Jan Mar May Jul Sep
## 1 4 9 16 25
Tip: To get information about a function try ?function name
Dataframes are one of the ways R presents tabular data.
A dataframe is comprised of columns and rows, both of which can be named, of the same length. Each column holds the same type of data.
rain <- read.csv("Data/MIL_DailyRainfall.csv")
rain$Date <- as.Date(rain$Date)
# Display the start or end of a dataframe with head() and tail()
head(rain)
Dataframes can be subset using [row, column]
rain[1, 1]
## [1] "2017-01-01"
rain[1,]
head(rain[,1])
## [1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04" "2017-01-05"
## [6] "2017-01-06"
# Create a month column for calculating statistics
rain$Month <- factor(format(rain$Date, "%b"), levels = month.abb)
head(rain)
Note that you can call a column by name with $
# # Calculate monthly total rainfall
# rainTotal <- tapply(rain$Rainfall, rain$Month, sum)
#
# # Create a dataframe with month and total rainfall
# df <- data.frame(Month = month.abb,
# Total = rainTotal)
#
# head(df)
# Add min, mean and max to the dataframe
pcal <- cbind(df,
Min = tapply(rain$Rainfall, rain$Month, min),
Mean = tapply(rain$Rainfall, rain$Month, mean),
Max = tapply(rain$Rainfall, rain$Month, max))
# Drop the rownames
row.names(pcal) <- 1:12
head(pcal)
## df Min Mean Max
## 1 ? 0 2.854839 36.5
## 2 ? 0 2.75 29.5
## 3 ? 0 3.096774 18.9
## 4 ? 0 6.416667 45.5
## 5 ? 0 3.354839 21.5
## 6 ? 0 1.083333 13
# Add min, mean and max to the dataframe
pcal <- cbind(df,
Min = tapply(rain$Rainfall, rain$Month, min),
### <b>
Mean = round(tapply(rain$Rainfall, rain$Month, mean), 2),
### </b>
Max = tapply(rain$Rainfall, rain$Month, max))
# Drop the rownames
row.names(pcal) <- 1:12
head(pcal)
## df Min Mean Max
## 1 ? 0 2.85 36.5
## 2 ? 0 2.75 29.5
## 3 ? 0 3.1 18.9
## 4 ? 0 6.42 45.5
## 5 ? 0 3.35 21.5
## 6 ? 0 1.08 13
R has built in plotting capabilities - to view more details about possible graphical parameters try ?plot
x <- data.frame(Month = 1:12,
Flow = c(25.4, 38.6, 27.3, 56.5, 53.4, 48.7,
59.1, 62.1, 60.1, 69.8, 25.3, 13.5))
plot(x)
plot(x, main = "Manawatu at Teachers College", pch = 16, col = 1:12, type = "b",
xlab = "2017", ylab = "Avg Monthly Flow (cumecs)")
# Choosing colours for the points based on flow values
cols <- vector(length = 12)
cols[x$Flow < 30] <- "red"
cols[x$Flow > 60] <- "blue"
cols[x$Flow >= 30 & x$Flow <= 60] <- "black"
—-_
plot(x, main = "Manawatu at Teachers College Monthly Flow", xaxt = "n",
xlab = "2017", ylab = "Avg Monthly Flow (cumecs)")
rect(6, 0, 9, 72, col = "lightgrey")
text(7.5, 45, "Winter")
lines(x$Month, x$Flow)
points(x, pch = 16, col = cols)
abline(h = 30, lty = 2)
abline(h = 60, lty = 2)
text(c(0, 20), "Low flows")
text(c(0, 65), "High flows")
axis(1, x$Month, month.abb)
par(oma=c(0, 0, 0, 5))
plot(x, main = "Manawatu at Teachers College Monthly Flow", xaxt = "n",
xlab = "2017", ylab = "Avg Monthly Flow (cumecs)")
rect(6, 0, 9, 72, col = "lightgrey")
text(7.5, 45, "Winter")
lines(x$Month, x$Flow)
points(x, pch = 16, col = cols)
abline(h = 30, lty = 2)
abline(h = 60, lty = 2)
text(c(0, 20), "Low flows")
text(c(0, 65), "High flows")
axis(1, x$Month, month.abb)
legend(x = par('usr')[2], y = par('usr')[4], bty='n', xpd=NA, pch = 16, col = unique(cols),
legend = c("Low flow", "Moderate flow", "High flow"))
There are a multitude of different packages for plotting. Some of the most popular include:
htmlwidgets
ggplot2 is part of the tidyverse package collection. Very basically it works by mapping the aesthetics of a dataset and then adding layers, scales, facets, etc… to build up a plot. If you would like to learn more about tidyverse coding I highly recommend R for data science by Hadley Wickham.
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
htmlwidgets are a collection of packages which provide an R interface to JavaScript libraries to create interactive visualisations. There are over 50 widgets available including dedicated options for mapping and timeseries. To see what widgets exist visit htmlwidgets for R
library(plotly)
plot_ly(data = iris, x = ~Sepal.Length, y = ~Petal.Length, color = ~Species, type = "scatter")
library(dygraphs)
dygraph(nhtemp, main = "New Haven Temperatures") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
library(leaflet)
leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=174.768, lat=-36.852, popup="The birthplace of R")
Functions in R are denoted as: function name( )
Function parameters or arguments are separated by a comma and may be named.
We have used a number of functions already such as;
x <- c(1:5)
length(x)
## [1] 5
plot(x, main = "A Plot", col = x, pch = 16)
Tip: To view a function’s arguments try typing in function name() and hit tab while the cursor is inside the parentheses.
Vectorization is a fundamental principle in R. It refers to the ability to perform operations on a vector as a whole rather than element by element.
For example we can perform vector arithmetic;
x <- c(1, 2, 3)
y <- c(10, 20, 30)
x+y
## [1] 11 22 33
Or we can pass a vector to a function and receive a result for each element;
sqrt(x)
## [1] 1.000000 1.414214 1.732051
The recycling rule is how R handles vector operations on vectors of different length. When the operation requires that both vectors be the same length R will automatically recycle the values of the shorter vector until both vectors are the same length.
For example;
x + 10
## [1] 11 12 13
x[c(TRUE, FALSE)]
## [1] 1 3
This example is taken from the book Professor Stewart’s Hoard of Mathematical Treasures by Ian Stewart. The solution code was provided by Prof. Ross Ihaka.
(8 x 8) + 13
(8 x 88) + 13
(8 x 888) + 13
(8 x 8888) + 13
(8 x 88888) + 13
(8 x 888888) + 13
(8 x 8888888) + 13
(8 x 88888888) + 13
The sequences of 8 can be decomposed into 8, 8 + 80, 8 + 80 + 800, ect…
Those values are the cummulative sum of the sequence 8, 80, 800, etc…
That sequence is equivalent to 8x10^0, 8x10^1, 8x10^2, etc…
So our code is as follows;
n = 8
cumsum(8 * 10^(1:n - 1))
## [1] 8 88 888 8888 88888 888888 8888888 88888888
8 * cumsum(8 * 10^(1:n - 1)) + 13
## [1] 77 717 7117 71117 711117 7111117 71111117
## [8] 711111117
There are several control constructs available in R, most commonly;
if(condition){
expression
}
if(condition){
expression_1
}else{
expression_2
}
for(variable in vector){
expression
}
How can we solve the calculator curiosity problem using control flow?
The most logical method for solving our problem is using a for loop.
Again building up the 8, 88, 888… is key.
We can do this by multiplying the previous value by 10 and adding 8.
numSeq = 8
for (i in 1:8) {
print(8 * numSeq + 13)
numSeq <- 10 * numSeq + 8
}
## [1] 77
## [1] 717
## [1] 7117
## [1] 71117
## [1] 711117
## [1] 7111117
## [1] 71111117
## [1] 711111117
At some point you may want to start writing your own functions.
This can easily be acomplished in R using the structure
functionName <- function(arg_1, arg_2 = defaultValue){
Expressions to do something
return(results)
}
You can then call your function with the default value
functionName(x)
or if you would like to overwrite the default value
functionName(x, arg_2 = y)
Now that we can calculate the problem, how can we make a function to output any number of answers?
Our vectorised version is already a simple expression using 1 input:
curiosity_1 <- function(n) 8 * cumsum(8 * 10^(1:n - 1)) + 13
curiosity_1(5)
## [1] 77 717 7117 71117 711117
Our control flow solution is a bit more complicated. We need some way to store the results instead of just printing them out.
curiosity_2 <- function(n){
ans <- numeric(n)
numSeq = 8
for (i in 1:n) {
ans[i] <- 8 * numSeq + 13
numSeq <- 10 * numSeq + 8
}
return(ans)
}
curiosity_2(3)
## [1] 77 717 7117